Search CORE

29 research outputs found

Novel geometric features for off-line writer identification

Author: Al-Maadeed Somaya
Bouridane Ahmed
Hassaine Abdelaali
Tahir Muhammad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2016
Field of study

Writer identification is an important field in forensic document examination. Typically, a writer identification system consists of two main steps: feature extraction and matching and the performance depends significantly on the feature extraction step. In this paper, we propose a set of novel geometrical features that are able to characterize different writers. These features include direction, curvature, and tortuosity. We also propose an improvement of the edge-based directional and chain code-based features. The proposed methods are applicable to Arabic and English handwriting. We have also studied several methods for computing the distance between feature vectors when comparing two writers. Evaluation of the methods is performed using both the IAM handwriting database and the QUWI database for each individual feature reaching Top1 identification rates of 82 and 87 % in those two datasets, respectively. The accuracies achieved by Kernel Discriminant Analysis (KDA) are significantly higher than those observed before feature-level writer identification was implemented. The results demonstrate the effectiveness of the improved versions of both chain-code features and edge-based directional features

Northumbria Research Link

Novel geometric features for off-line writer identification

Author: Abdelaali Hassaine
Ahmed Bouridane
Muhammad Atif Tahir
Somaya Al-Maadeed
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Qatar University Institutional Repository

Springer - Publisher Connector

Sentiment Analysis in Comments Associated to News Articles: Application to Al Jazeera Comments

Author: Al-Kubaisi Khalid
Hassaine Abdelaali
Jaoua Ali
Publication venue: 'Hamad bin Khalifa University Press (HBKU Press)'
Publication date: 01/01/2016
Field of study

Sentiment analysis is a very important research task that aims at understanding the general sentiment of a specific community or group of people. Sentiment analysis of Arabic content is still in its early development stages. In the scope of Islamic content mining, sentiment analysis helps understanding what topics Muslims around the world are discussing, which topics are trending and also which topics will be trending in the future. This study has been conducted on a dataset of 5000 comments on news articles collected from Al Jazeera Arabic website. All articles were about the recent war against the Islamic State. The database has been annotated using Crowdflower which is website for crowdsourcing annotations of datasets. Users manually selected whether the sentiment associated with the comment was positive or negative or neutral. Each comment has been annotated by four different users and each annotation is associated with a confidence level between 0 and 1. The confidence level corresponds to whether the users who annotated the same comment agreed or not (1 corresponds to full agreement between the four annotators and 0 to full disagreement). Our method represents the corpus by a binary relation between the set of comments (x) and the set of words (y). A relation exists between the comment (x) and the word (y) if, and only if, (x) contains (y). Three binary relations are created for comments associated with positive, negative and neutral sentiments. Our method then extracts keywords from the obtained binary relations using the hyper concept method [1]. This method decomposes the original relation into non-overlapping rectangles and highlights for each rectangle the most representative keyword. The output is a list of keywords sorted in a hierarchical ordering of importance. The obtained keyword list associated with positive, negative and neutral comments are fed into a random forest classifier of 1000 random trees in order to predict the sentiment associated with each comment of the test set. Experiments have been conducted after splitting the database into 70% training and 30% testing subsets. Our method achieves a correct classification rate of 71% when considering annotations with all values of confidence and even 89% when only considering the annotation with a confidence value equal to 1. These results are very promising and testify of the relevance of the extracted keywords. In conclusion, the hyper concept method extracts discriminative keywords which are used in order to successfully distinguish between comments containing positive, negative and neutral sentiments. Future work includes performing further experiments by using a varying threshold level for the confidence value. Moreover, by applying a part of speech tagger, it is planned to perform keyword extraction on words corresponding to specific grammatical roles (adjectives, verbs, nouns… etc.). Finally, it is also planned to test this method on publicly available datasets such as the Rotten Tomatoes Movie Reviews dataset [2]. Acknowledgment This contribution was made possible by NPRP grant #06-1220-1-233 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.qscienc

Qatar University Institutional Repository

Crossref

Named Entity Disambiguation using Hierarchical Text Categorization

Author: Al Otaibi Jameela
Hassaine Abdelaali
Jaoua Ali
Publication venue: 'Hamad bin Khalifa University Press (HBKU Press)'
Publication date: 01/01/2016
Field of study

Named entity extraction is an important step in natural language processing. It aims at finding the entities which are present in text such as organizations, places or persons. Named entities extraction is of a paramount importance when it comes to automatic translation as different named entities are translated differently. Named entities are also very useful for advanced search engines which aim at searching for a detailed information regarding a specific entity. Named entity extraction is a difficult problem as it usually requires a disambiguation step as the same word might belong to different named entities depending on the context. This work has been conducted on the ANERCorp named entities database. This Arabic database contains four different named entities: person, organization, location and miscellaneous. The database contains 6099 sentences, out of which 60% are used for training 20% for validation and 20% for testing. Our method for named entity extraction contains two main steps: the first step predicts the list of named entities which are present at the sentence level. The second step predicts the named entity of each word of the sentence. The prediction of the list of named entities at the sentence level is done through separating the document into sentences using punctuation marks. Subsequently, a binary relation between the set of sentences (x) and the set of words (y) is created from the obtained list of sentences. A relation exists between the sentence (x) and the word (y) if, and only if, (x) contains (y). A binary relation is created for each category of named entities (person, organization, location and miscellaneous). If a sentence contains several named entities, it is duplicated in the relation corresponding to each one of them. Our method then extracts keywords from the obtained binary relations using the hyper concept method [1]. This method decomposes the original relation into non-overlapping rectangles and highlights for each rectangle the most representative keyword. The output is a list of keywords sorted in a hierarchical ordering of importance. The obtained keyword list associated with each category of named entities are fed into a random forest classifier of 10000 random trees in order to predict the list of named entities associated with each sentence. The random forest classifier produces for each sentence the list of probabilities corresponding to the existence of each category of named entities within the sentence. Random Forest [sentence(i)] = (P(Person),P(Organization),P(Location),P(miscellaneous)). Subsequently, the sentence is associated with the named entities for which the corresponding probability is larger than a threshold set empirically on the validation set. In the second step, we create a lookup table associating to each word in the database, the list of named entities to which it corresponds in the training set. For unseen sentences of the test set, the list of named entities predicted at the sentence level is produced, and for each word, the list of predicted named entities is also produced using the lookup table previously built. Ultimately, for each word, the intersection between the two predicted lists of named entities (at the sentence and the word level) will give the final predicted named entity. In the case where more than one named entity is produced at this stage, the one with the maximum probability is kept. We obtained an accuracy of 76.58% when only considering lookup tables of named entities produced at the word level. When performing the intersection with the list produced at the sentence level the accuracy reaches 77.96%. In conclusion, the hierarchical named entity extraction leads to improved results over direct extraction. Future work includes the use of other linguist features and larger lookup table in order to improve the results. Validation on other state of the art databases is also considered. Acknowledgements This contribution was made possible by NPRP grant #06-1220-1-233 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.qscienc

Qatar University Institutional Repository

Crossref

An explainable Transformer-based deep learning model for the prediction of incident heart failure

Author: Canoy Dexter
Cleland John
Hassaine Abdelaali
Li Yikuan
Lukasiewicz Thomas
Rahimi Kazem
Ramakrishnan Rema
Rao Shishir
Salimi-Khorshidi Gholamreza
Publication venue
Publication date: 27/01/2021
Field of study

Predicting the incidence of complex chronic conditions such as heart failure is challenging. Deep learning models applied to rich electronic health records may improve prediction but remain unexplainable hampering their wider use in medical practice. We developed a novel Transformer deep-learning model for more accurate and yet explainable prediction of incident heart failure involving 100,071 patients from longitudinal linked electronic health records across the UK. On internal 5-fold cross validation and held-out external validation, our model achieved 0.93 and 0.93 area under the receiver operator curve and 0.69 and 0.70 area under the precision-recall curve, respectively and outperformed existing deep learning models. Predictor groups included all community and hospital diagnoses and medications contextualised within the age and calendar year for each patient's clinical encounter. The importance of contextualised medical information was revealed in a number of sensitivity analyses, and our perturbation method provided a way of identifying factors contributing to risk. Many of the identified risk factors were consistent with existing knowledge from clinical and epidemiological research but several new associations were revealed which had not been considered in expert-driven risk prediction models

arXiv.org e-Print Archive

Oxford University Research Archive

Enlighten

Novel geometric features for off-line writer identification

Author: A Bensefia
Abdelaali Hassaine
Ahmed Bouridane
HE Said
I Siddiqi
M Bulacu
Muhammad Atif Tahir
N Otsu
Somaya Al-Maadeed
T Zhang
U-V Marti
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

Author: Abbasizanjani Hoda
Ahmed Badar
Ahmed Nida
Akbari Ashley
Akbari Ashley
Akinoso-Imran Abdul Qadr
Allara Elias
Allery Freya
Angelantonio Emanuele Di
Ashworth Mark
Ayyar-Gupta Vandana
Babu-Narayan Sonya
Bacon Seb
Ball Steve
Banerjee Ami
Banerjee Amitava
Barber Mark
Barrett Jessica
Bennie Marion
Berry Colin
Beveridge Jennifer
Birney Ewan
Bojanić Lana
Bolton Thomas
Bone Anna
Boyle Jon
Braithwaite Tasanee
Bray Ben
Briffa Norman
Brind David
Brown Katherine
Buch Maya
Canoy Dexter
Caputo Massimo
Carragher Raymond
Carson Alan
Cezard Genevieve
Chang Jen-Yu Amy
Cheema Kate
Chin Richard
Chudasama Yogini
Cooper Jennifer
Cooper Jennifer
Copland Emma
Crallan Rebecca
Cripps Rachel
Cromwell David
Curcin Vasa
Curry Gwenetta
Dale Caroline
Danesh John
Das-Munshi Jayati
Dashtban Ashkan
Davies Alun
Davies Gareth
Davies Joanna
Davies Neil
Day Joshua
Delmestri Antonella
Denaxas Spiros
Denaxas Spiros
Denholm Rachel
Dennis John
Denniston Alastair
Deo Salil
Dhillon Baljean
Docherty Annemarie
Dong Tim
Douiri Abdel
Downs Johnny
Dregan Alexandru
Ellins Elizabeth A
Elwenspoek Martha
Falck Fabian
Falter Florian
Fan Yat Yi
Firth Joseph
Fraser Lorna
Friebel Rocco
Gavrieli Amir
Gerstung Moritz
Gilbert Ruth
Gillies Clare
Glickman Myer
Goldacre Ben
Goldacre Raph
Greaves Felix
Green Mark
Grieco Luca
Griffiths Rowena
Gurdasani Deepti
Halcox Julian
Hall Nick
Hama Tuankasfee
Handy Alex
Handy Alex
Hansell Anna
Hardelid Pia
Hardy Flavien
Harris Daniel
Harrison Camille
Harron Katie
Hassaine Abdelaali
Hassan Lamiece
Healey Russell
Hemingway Harry
Hemingway Harry
Henderson Angela
Herz Naomi
Heyl Johannes
Hidajat Mira
Higginson Irene
Hinchliffe Rosie
Hippisley-Cox Julia
Ho Frederick
Hocaoglu Mevhibe
Hollings Sam
Hollings Sam
Horne Elsie
Hughes David
Humberstone Ben
Inouye Mike
Ip Samantha
Islam Nazrul
Jackson Caroline
Jenkins David
Jiang Xiyun
Johnson Shane
Kadam Umesh
Kallis Costas
Karim Zainab
Kasan Jake
Katsoulis Michalis
Kavanagh Kim
Kee Frank
Keene Spencer
Kent Seamus
Khalid Sara
Khawaja Anthony
Khunti Kamlesh
Killick Richard
Kinnear Deborah
Knight Rochelle
Kolamunnage-Dona Ruwanthi
Kontopantelis Evan
Kurdi Amanj
Lacey Ben
Lai Alvina
Lai Alvina G
Lambarth Andrew
Larzjan Milad Nazarzadeh
Lawler Deborah
Lawrence Thomas
Lawson Claire
Li Ken
Li Kezhi
Li Qiuju
Llinares Miguel Bernabeu
Lorgelly Paula
Lowe Deborah
Lyons Jane
Lyons Ronan
Machado Pedro
Macleod John
Macleod Mary Joan
Malgapo Evaleen
Mamas Mamas
Mamouei Mohammad
Manohar Sinduja
Mapeta Rutendo
Martelli Javiera Leniz
Martos David Moreno
Mateen Bilal
Mateen Bilal A
McCarthy Aoife
Melville Craig
Milton Rebecca
Mizani Mehrdad
Mizani Mehrdad A
Moncusi Marta Pineda
Morales Daniel
Mordi Ify
Morrice Lynn
Morris Carole
Morris Eva
Mu Yi
Mueller Tanja
Murdock Lars
Nafilyan Vahé
Nicholson George
Nikiphorou Elena
Nolan John
Norris Ruth
Norris Tom
North Laura
North Teri-Louise
O'Connell Dan
Oliver Dominic
Oluyase Adejoke
Olvera-Barrios Abraham
Omigie Efosa
Onida Sarah
Padmanabhan Sandosh
Pagel Christina
Palmer Tom
Pasea Laura
Patel Riyaz
Payne Rupert
Pell Jill
Petitjean Carmen
Pherwani Arun
Pickrell Owen
Pierotti Livia
Pirmohamed Munir
Priedon Rouven
Prieto-Alhambra Dani
Proudfoot Alastair
Quinn Terry
Quint Jennifer
Raffetti Elena
Rahimi Kazem
Rao Shishir
Razieh Cameron
Roberts Brian
Rogers Caroline
Rossdale Jennifer
Salim Safa
Samani Nilesh
Sattar Naveed
Sattar Naveed
Schnier Christian
Schwartz Roy
Selby David
Seminog Olena
Shabnam Sharmin
Shah Ajay
Shelton Jon
Sheppard James
Sinha Shubhra
Skrypak Mirek
Slapkova Martina
Sleeman Katherine
Smith Craig
Sofat Reecha
Sofat Reecha
Sosenko Filip
Sperrin Matthew
Steeg Sarah
Sterne Jonathan
Sterne Jonathan AC
Stoica Serban
Sudell Maria
Sudlow Cathie
Sudlow Cathie
Sun Luanluan
Suseeladevi Arun Karthikeyan
Sweeting Michael
Sydes Matt
Takhar Rohan
Tang Howard
Thygesen Johan
Thygesen Johan H
Tilston George
Tochel Claire
Toit Clea du
Tomlinson Christopher
Tomlinson Christopher
Toms Renin
Torabi Fatemeh
Torralbo Ana
Torralbo Ana
Townson Julia
Tufail Adnan
Tungamirai Tapiwa
Varma Susheel
Vollmer Sebastian
Walker Venexia
Wang Huan
Wang Tianxiao
Warwick Alasdair
Watkinson Ruth
Watson Harry
Whiteley William
Whiteley William N
Whittaker Hannah
Wilde Harry
Wilkinson Tim
Williams Gareth
Williams Michelle
Williams Richard
Withnell Eloise
Wolfe Charles
Wood Angela
Wood Angela
Wright Lucy
Wu Honghan
Wu Honghan
Wu Jianhua
Wu Jinge
Yates Tom
Zaccardi Francesco
Zhang Haoting
Zhang Huayu
Zuccolo Luisa
Publication venue: 'Elsevier BV'
Publication date: 09/06/2022
Field of study

BACKGROUND: Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. METHODS: In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status. FINDINGS: Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1. INTERPRETATION: Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources. FUNDING: British Heart Foundation Data Science Centre, led by Health Data Research UK

UCL Discovery